Matrix Decomposition of Spectrograms

Chris Tralie

In [1]:
%load_ext autoreload
%autoreload 2
import librosa
import librosa.display
import numpy as np
import matplotlib.pyplot as plt
import IPython.display as ipd
import pandas as pd
from scipy import signal
from spectrogramtools import *
from collections import OrderedDict

Let's suppose we have a column matrix

$ A = \left[ \begin{array}{c} 1 \\ 2 \\ 3 \end{array} \right] $

and a row matrix

B = [ 1 0 0 2 0 0 3 0 0 1 ]

then their multiplication is the matrix

In [2]:
A = np.array([[1], [2], [5]])
B = np.array([[1, 0, 0, 2, 0, 0, 3, 0, 0, 1]])
print(A.dot(B))
[[ 1  0  0  2  0  0  3  0  0  1]
 [ 2  0  0  4  0  0  6  0  0  2]
 [ 5  0  0 10  0  0 15  0  0  5]]
In [3]:
y_voice, sr = librosa.load("princevoice.wav", sr=22050)
ipd.Audio(y_voice, rate=sr)
Out[3]:
In [4]:
y_drum, sr = librosa.load("princedrum.wav", sr=22050)
ipd.Audio(y_drum, rate=sr)
Out[4]:
In [5]:
win_length = 2048*3
hop_length = 512
SVoice = STFT(y_voice, win_length, hop_length, useLibrosa=False)
SVoice = SVoice[:, 0]
SVoice = np.reshape(SVoice, (SVoice.size, 1))
SDrum = STFT(y_drum, win_length, hop_length, useLibrosa=False)
SDrum = SDrum[:, 0]
SDrum = np.reshape(SDrum, (SDrum.size, 1))
In [6]:
plt.subplot(121)
librosa.display.specshow(librosa.amplitude_to_db(np.abs(SDrum), ref=np.max), x_axis='time')
plt.title("Drum")
plt.subplot(122)
librosa.display.specshow(librosa.amplitude_to_db(np.abs(SVoice), ref=np.max), x_axis='time')
plt.title("Voice")
Out[6]:
Text(0.5, 1.0, 'Voice')
In [7]:
H = np.zeros((1, 1000))
N = 40
T = 20
H[0, 0:N*T:T] = 1/(1+np.arange(N))
S = SVoice.dot(H)
y = iSTFT(S, win_length, hop_length)

plt.subplot(211)
plt.plot(H[0, :])
plt.subplot(212)
plt.imshow(np.log(1 + np.abs(S)/1e-3), aspect='auto', cmap='magma_r')
plt.gca().invert_yaxis()

ipd.Audio(y, rate=sr)
Out[7]:
In [8]:
H = np.zeros((1, 500))
H[0, 0::20] = 1
V = SDrum.dot(H)
y = iSTFT(V, win_length, hop_length)
ipd.Audio(y, rate=sr)
Out[8]:

A 4 on 3 Rhythm

In [9]:
H = np.zeros((1, 500))
H[0, 0::20] = 1
H[0, 0::15] = 1
V = SVoice.dot(H)
y = iSTFT(V, win_length, hop_length)
ipd.Audio(y, rate=sr)
Out[9]:
In [10]:
plt.plot(H[0, :])
Out[10]:
[<matplotlib.lines.Line2D at 0x7f3ba74a1e10>]

A 4 on 3 Rhythm with Different Instruments

We create a 4 on 3 rhythm where the 3 is voice and the 4 is drum by doing two separate matrix multiplications and adding them together

In [11]:
H1 = np.zeros((1, 500))
H1[0, 0::20] = 1
H2 = np.zeros((1, 500))
H2[0, 0::15] = 1
V = SVoice.dot(H1) + SDrum.dot(H2)

y = iSTFT(V, win_length, hop_length)
ipd.Audio(y, rate=sr)
Out[11]:

But there's a more elegant way to do this. If we create a matrix $A$ with two columns, where the first is the voice spectrogram window and the second is the drum spectrogram window

In [12]:
SBoth = np.concatenate((SVoice, SDrum), axis=1)
librosa.display.specshow(librosa.amplitude_to_db(np.abs(SBoth), ref=np.max), x_axis='time')
Out[12]:
<matplotlib.collections.QuadMesh at 0x7f3ba74933d0>

And then we create a matrix with two rows, then the first row corresponds to activations of the voice and the second row corresponds to activations of the drum, and we can do a single matrix multiplication of the $M x 2$ and $2 x N$ matrices. The matrix $H$ can be thought of as a little musical score where each row says which instrument is active over different points of time

In [13]:
H = np.zeros((2, 500))
H[0, 0::20] = 1 # First row corresponds to activations of first column
H[1, 0::15] = 1 # Second row corresponds to activations of second column
V = SBoth.dot(H)
y = iSTFT(V, win_length, hop_length)
plt.figure(figsize=(12, 4))
plt.imshow(H, aspect='auto', interpolation='none')
ipd.Audio(y, rate=sr)
Out[13]:
In [ ]: